Selecting the number of components in principal component analysis using cross-validation approximations

نویسندگان

  • Julie Josse
  • François Husson
چکیده

Cross-validation is a tried and tested approach to select the number of components in principal component analysis (PCA), however, its main drawback is its computational cost. In a regression (or in a non parametric regression) setting, criteria such as the general cross-validation one (GCV) provide convenient approximations to leave-one-out crossvalidation. They are based on the relation between the prediction error and the residual sum of squares weighted by elements of a projection matrix (or a smoothing matrix). Such a relation is then established in PCA using an original presentation of PCA with a unique projection matrix. It enables the definition of two cross-validation approximation criteria: the smoothing approximation of the cross-validation criterion (SACV) and the GCV criterion. The method is assessed with simulations and gives promising results. Crown Copyright© 2011 Published by Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Information Complexity in Principal Component Regression Modeling of the Venturi Meter Drift

In principal component regression there is a problem of selecting the number of principal components to be retained in the model. Those principal components corresponding to near-zero eigenvalues can ruin the precision of the regression coefficients estimator and therefore must be eliminated from the model. However, when the eigenspectrum gradually decays, it is difficult to decide how many pri...

متن کامل

Developing and Validation of Moral Behavior Styles Inventory

Article history: Received date: 13 September, 2016 Review date: 2 October 2016 Accepted date:20 November 2016 Printed on line: 5 January Purpose: The present study was done to introduce an efficient tool in the field of moral behavior. Material & Method: method of the study was correlational, its approach was test developing and its population was students of Islamic Azad University- Ast...

متن کامل

Feature selection using genetic algorithm for classification of schizophrenia using fMRI data

In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...

متن کامل

Assessment of Cost Effectiveness of a Firm Using Multiple Cost Oriented DEA and Validation with MPSS based DEA

Data Envelopment Analysis (DEA) is a nonparametric tool for discriminating the best performers from a number of homogenous Decision Making Units (DMU). Cost oriented DEA models identify those best DMUs which run cost efficient process. This paper validates the outcome derived from the Ideal Frontier (mentioned in Sarkar. S (2014)) derived from non-central Principal Component Analysis and a slac...

متن کامل

Principal Component Analysis for Soil Conservation Tillage vs Conventional Tillage in Semi Arid Region of Punjab Province of Pakistan

Principal component analysis is a valid method used for data compression and information extraction in a given set of experiments. It is a well-known classical data analysis technique. There are a number of algorithms for solving the problems, some scaling better than others. Wheat ranks as the staple food of most of the nations as well as an agent of poverty reduction, food security and world ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2012